Slides on ATP binding sites as example for the power (or lack of power) of convergent evolution
Slides on ancient paralogs
Info on the Gaia hypothesis: http://en.wikipedia.org/wiki/Gaia_hypothesis
Goals Class 5:
- Appreciate that without selection keeping sequences from accumulating substitutions one could not find sequence similarities between sequence that diverged a few 100 million years ago.
- Know a few reasons why protein sequences work better to assess similarity than nucleotides
- What are inteins, and which enzymatic activities do they have?
- What are the possible symbiotic relationships between organisms, genes, or protein domains?
Slides on inteins,
Word document on substitutions over time, Exel spreadsheets on Jukes Cantor sequence divergence (aa and nucs)
Wikipedia on the Gaia hypothesis (especially the three sections on criticism)
Goals Comp Lab 3:
- Know how to
identify domains in multi domain proteins in chimera;
- inspect protein DNA interactions;
- identify the major and minor groove in a DNA molecule;
- create a multiple sequence alignment based on aligned structures in chimera
Goals Class 6:
- Know about the tree and coral metaphors to depict evolution
- Know the differences between Lamarck's and Darwin's theory
- Understand the power and the limitations of the tree of life image.
- Understand the contributions that Woese and Fox made to the classification of life, which molecule they used, and the domains they discovered
- Understand the relationship between the 3 domains, and how the tree of life was rooted.
Slides on the Coral of Life - the tangled tree, gene transfer, exchange groups
Goals Class 7:
- Know about metaphors for life's history, ad why they were thought to be relevant (rhizome -, potato -, tree -, coral -, watershed of life)
- Know about the discussion on competition and cooperation, social Darwinism and mutual aid.
- Understand that Darwin, while an abolitionist, was a child of this times.
- Know at least some processes in evolution that go beyond gradual evolution by natural selection
- Understand the problems that result from ownership of database entries.
- Appreciate the difference between supervised databanks and simple repositories.
Slides on Mutual Aid, Natural selection, and metaphors to describe Life's history.
Goals Comp Lab 4:
- Know about bibliography software
- Know about the advantage of the databanks accessible through NCBI's Entrez.
- Be able to perform literature databank searches at Google scholar, Scopus and pubmed
- Know how to retrieve full length manuscripts
- Appreciate number usefulness of # of publications, # of citations, and the H-index
- Know how to access manuscripts similar to one that you know is relevant to you.
- Appreciate that GenBank is highly redundant
- Know that searches at the protein level are more effective than searches at the nucleotide level.
Goals Class 8:
- Understand the difference between P and E values
- Know about "usual" cut-offs for Z-scores, P- and E-values.
- Be able to discuss the process that may lead to the decay of significance
- Know what fishing expeditions are about
See the yellow box on the web page for class 8
Goals Class 9:
- Understand the problems that result from ownership of database entries (know a few examples).
- Problems and advantages of databanks with a gate keeper
- Understand how the databanks at the NCBI are different from flatfile and relational databanks.
- Appreciate the meaning and relationship between of E and P values
- Understand the BLAST is perfoming local alignments only
- Know what false positives and false negatives are in relation to a BLAST search
- Know what the Bonferoni correction is, and why it is not popular.
- Know who Margaret Dayhoff was and what contributions she made to modern molecular evolution and bioinformatics.
See the slides on the history of genbank and on blast searches
Goals Comp Lab 5:
- Learn how to log into the xanadu cluster
- Use simple unix commands (cd, pwd, cat, more)
- learn how to redirect the output of a unix command to a file
- Perform pairwise sequence comparisons with PRSS and blast
- Databank searches with FASTA and blast.
- Appreciate the advantage of uniprot90 and uniprot50 for some questions you might try to answer through a databank search.
- Be aware the homology does not always extend to the complete protein sequence
Goals Class 10:
- Be able to discuss the advantages of the command-line in general and blast searches via the command-line in particular.
- Know which substitution matrices to use for comparing similar and for divergent sequences
- Understand the different types of error with as applied to data bank searches
- Know how to adjust significant levels of individual experiments to avoid fishing expeditions.
See the slides on blast searches
Goals Class 11:
- Appreciate that Y-chromosome Adam and Mitochondrial Eve were not the only contributors to the gene pool of modern human
- Understand that the about 20,000 compatriotes of mitochondrial Eve contributed their genetic information to today's human (but because of recombination these cannot be traced back in the genealogy).
- Know that the same is true for tracing genes in the tree of life (different genes trace to different ancestors).
- Know about the variety of archaeal human admixtures.
- Appreciate that gene transfer has to consequences for reconstructing phylogenies A) individual gene do not necesarily trace organismal evolution; B) genes that were acquirred from a divergent lineage, and that are present in all descendants of the recipient provide a good marker to group the organisms that have these atypical genes together.
- Know about the role of marine Prochlorococcus
See slides here
Goals Comp Lab 6:
- Understand how to run a blast search from the command-line.
- Be able to create a searchable database from a multiple fasta sequence file
- Know about different output formats, and be able to import blast search results into an Excel spreadsheet.
- Understand that %identity is not a good choice to assess significant sequence similarity.
Goals Class 12:
- Understand genome structure of prokaryotic genomes (Ori, leading/lagging strand, terminus of replication).
- Know two explanations that can explain the preponderance of recombination events between points equisistant to the origin of replication (co-occurrence of recombination with replication; AIMS and strand bias are not disrupted (i.e., these are the only recombination evnets that do not lead to a drop in fitness))
- Know how to compare two bacterial or archaeal chromosomes using a gene plot.
- Understand the power of simple script to run programs and to process the output of programs
- Know what is meant by GC strandbias, and why this is not a violation of G and C occurring in double stranded DNA with the same frequency (1st Chargaff rule)
- Know the role of AIMS in replication.
- Appreciate the use of hashes as an efficient data structure linking keys and values
Slides here
Goals Class 13:
- Different ways new genes can arise in evolution
- Different ways genes can be duplicated (individual genes, tandem duplications, via mRNA intermediate, as part of whole or partial genome duplications, alloploidy, gene transfer)
- Know about the fate of duplicated genes (non-functionalization, sub-functionalization, neo-functionalization)
- Appreciate the frequency with which these events occur (non-functionalization -- most of the time, sub-functionalization -- most of the time when both genes remain functional, neo-functionalization -- rare)
- Know that gene duplication followed by non-functionalization can lead to post mating hybridiation
- Know what to study for the miderm.
-------------------------------------------------------------------------------------
Goals Comp Lab 7:
Become familiar with
- the commandline,
- making searchable libraries from multiple sequence files
- running blast,
- simple commands (cat xxx yyy > zzz) and
- scripts to modify files and to parse and plot blast search results
Goals Class 14:
- Review session. Know what to expect on the midterm.
Goals Comp Lab 8:
- Know how dotlet works, and why this is sometimes advantages over the dot-matrix comparison done in a pairwise blast search.
- Understand the concept of sequence space, and how it may be used or visualize relationships between proteins and to discuss protein evolution.
- Know that a principle component analysis project a multidimensional space onto only a few dimensions.
Goals class 15:
- Understand the controversy about the term monophyletic, and how the different interpretations lead to different taxonomies.
- Know the difference between local and global alignments.
- Know how gaps, insertions, repeated domains, and regions of low complexity look like in a dotplot analysis.
- Understand how dynamic programming can guarantee an alignment with an optimal alignment score in case of a pairwise alignment.
- Understand the principle of the progressive alignment approach and the potential downstream problems caused by this analysis.
- Appreciate that multiple sequence alignments can have different goals: pleasing to the human observer; matching sites that in the 3D structure occupy the corresponding location; be certain that alignment columns only contain homologous sites (else align them to gaps).
Slides on cladistics
Goals Class 16:
- Appreciate the very different understanding of inheritance at Darwin's time
- The process of inheritance of acquired characteristics that we describe with the term of Lamarckism was generally accepted at Darwin's time, and Darwin also subscribed to this idea
- At Darwin's time the ideas associated with Lamarck not only included the inheritance of acquired characteristics, but also a teleological driver for changes
- Note the similarity between some versions of the Holobiont idea and the pangenesis description of inhertance and variation
Goals Comp Lab 9:
- Appreciate the power of scripts to run programs repeatedly using different input files
- Appreciate the power of scripts to extract information from the verbose output of a program
- Know how simple statistics on all protein sequences present in a genome can inform on the physiology and adaptations of an organism
Goals Class 17:
- Know what the terms global - and local alignment refer to
- Understand the steps in a progressive alignment
- Know that the Needleman Wunsch approach to building a pairwise alignment is guaranteed to find an optimal alignment
- Be aware of the advantages (better alignment between sites that correspond in a structural alignment) of a multiple versus a pairwise alignment
- Know about the problems associated with a progressive alignment.
- Be aware that a "good looking" alignment might not be the best unbiased starting point for phylogenetic reconstruction
- Know about different alignment programs (clustalw, Muscle, MAFFT, PRANK)
Box on alignment in Class 17
Goals Class 18:
- Kbow about the introns early -introns debate
- Know about the supporting evidence for both sides
- Know how Go plots are created, and how they are used to define protein structure building blocks
- Know why the finding of the intron in TriosePhosphateIsomerase encoding gene Culex is not an arguement for introns early
- Understand the parenthesis or Newick format to denote trees
- Know the principle behind parsimony analysis and Occam's razor (or Ockham's razor, aka lex parsimoniae)
- Know the similarity and differences between parsimony and maximum likelihood based phylogenetic reconstruction
slides on intons early versus late
Goals Comp Lab 10:
- Know different algorithms to align sequences
- Learn how to navigate the seaview graphics user interface
- Understand that ancient gene duplications were used to root the tree of life
- Recognize that bootstrapping provides an easy way to discard some of the splits in a phylogenetic reconstruction as meaning full
Coals Class 19:
- Know about the different ways the tree of life can be rooted
- Know the reasons why a gene tree might be different from the species tree
- Be able to read bipartition tables
- Know the basic structure with which program in phylip work.
- Know about the relationship between maximum likelihood, posterior and prior probability
Slides on phylogenetic reconstruction
Goals Class 20:
- Understand the principle behind the Metroplis Coupled Markov Chain Monte Carlo approach to biased sampling in Bayesian exploration of parameter space.
- Understand that the shape parameter of the Gamma distribution allows to estimate the distribution of rate variation along a sequence.
- Know how
- lack of resolution,
- lineage sorting,
- gene transfer,
- introgression, and
- systematic artifacts
- can lead to differences between gene and "species" trees.
Slides on MC Robot and causes for species gene tree difference
Goals Comp Lab 11:
- Appreciate the long branch attraction artifact as a serious problem in reconstructiong evolutionary history
- Know which approaches to phylogenetic reconstruction are most, and which are less sensitive
- Know that missing data can cause a problem in phylogenetic reconstruction, especially if the data missing in some sequences evolve under a different substitution rate
Goals Class 21:
- Know the difference between mutation and substitution.
- Understand why for neutral mutations the mutation rate equals the substitution rate.
- Understand that even with very large populations, most mutations that provide a small selective advantage go extinct due to genetic drift.
- Understand how population size impacts the time it takes for fixation of a neutral mutation
- Know what the terms positive, negative, and neutral selection mean and what frequent synonyms for these terms are.
slides on population genetics, genetic drift (slides 12 - 36)
Goals Class 22:
- Know how to infer the type of selection using synonymous and non-synonymous substitutions.
- Know that one can infer the type of selection from the rate with which a gene goes to fixation.
- Be able to discuss the terms positive and diversifying selection
- Know about the parameters in Ziheng Yang's model to incorporate dN/dS into sequence evolution
- Know that the dN/dS>1 approach can be used to detect positive selection and that this approach is often difficult to apply in case the alignment is unreliable (which results in more non-synonymous substitutions).
Slides for dN/dS and Bayesian analyses (upto slide 19 - 47)
Goals Comp Lab 12:
- Have an idea how a Bayesian analysis using the mcmc approach works
- Understand the concept of a burnin
- Know hoy the samples from the Markov chain can be used to calculate posterior probabilities and confidence intervals.
- Understand the relation between dN/dS (aka omega) ratios and type of selection
- Know that positive selection is rare, but that it can be detected using the dN/dS values
- Understand how the dN/dS for individual sites can be estimated.
- Know that the program tracer can be used to evaluate parameter files resulting from Bayesian mcmc analyses.
Goals class 23:
- Know that the dN/dS<1 approach to detect purifying selection may not always reflect a selection for function.
- Understand that selection can act at the level of the gene (selfish genetic elements), at the level of of the individual in the population, and possibly through the competition between populations and communities.
- Know about GTAs and the contribution that sex and recombination make to evolution
- Know what an evolutionary stable strategy refers to
- Understand the link between positive selection and selective sweep.
- Know different approaches of how selective sweeps can be detected.
- Know that HGT, genome duplications, and introgression can lead to non-gradual evolution
- Know examples for biochemical pathways likely assembled through HGT.
Slides class 23
Goal class 24:
- Appreciate the power of PSI-blast to find divergent homologs
- Know the principle behind PSI blast searches
- Be aware of the problems that a corruption of the scoring matrix causes
- Know what the consequence of this corruption means for the E-value of a match.
- Understand the meaning of the E-value cut-off for inclusion in the next iteration
- Know the meaning of the E-value of a hit in a PSI blast search
- Know about the problem in estimating the expected numbers of false positives in PSI blast searches
- Appreciate that building the profile / scoring matrix can use a different databank as compared to the final search
- Know a few possible applications for PSI or Hmmer searches
- Know about the problems associated with shotgun genome sequencing of multicellular organisms that live with their symbionts and other bacteria - these genome sequences should be considered metagenome sequences.
- Understand that binning based on composition and coverage can help identify genes from the host and differentiate them form bacterial genes. (But might classify genes recently transferred to the host as belonging to bacteria.)
- Understand the demonstration that the UNC tardigrade genome contigs contained many genes not part of the tardigrade genome.
Slides class 24
Goals Comp Lab 12:
- Know that the starting sequences impacts the number of homologs that can be detected.
- Know that using a PSSM in a tblastn search can detect decaying molecular parasites.
- See class 24 more
Goal class 25:
- Know about different approaches to detect HGT events (phylogenetic conflict and using bllast as proxy, gene presence absence, composition based)
- Know about the advantages and disadvantages to use bipartition based analysis to identify phylogenetic conflict.
- Understand why embedded quartets are advantageous.
- Know about the difference between supertree and supermatrix approaches, and appreciate the advantages and disadvantages of each approach.
Slides class 25